pacman::p_load(olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary)Take-home Exercise 3: Modelling Geography of Financial Inclusion with Geographically Weighted Methods
1 Overview
Tanzania is one of the Africaβs fastest growing economies with nearly 7 percent annual national GDP growth since 2000. Despite the robust economic growth, widespread poverty persists. Approximately 49% of Tanzaniaβs population lives below the international extreme poverty line. This could be due to the lack of financial inclusion - the availability and equality of opportunities to access financial services such as banking, loan, equity and insurance products.
1.1 Objective
The objective of this exercise is to build an exploratory model that determines factors affecting financial inclusion by using non-spatial regression and geographically weighted regression methods.
Examples of independent factors include demographics (rural vs urban), marital status, gender, financial education, access to technology etc.
Examples of dependent factors include savings, financial situation, free access to banks, insurance coverage etc.
2 Getting Started
The code chunk below installs and loads olsrr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, tidyverse, gtsummary packages into R environment:
3 Data Preparation
For the purpose of this exercise, we will be using two data sets:
District level boundary GIS data for Tanzania
FinScope Tanzania 2023 Survey results
3.1 Importing data into R Environment
GB = st_read(dsn = "data/rawdata/geospatial", layer = "geoBoundaries-TZA-ADM2")Reading layer `geoBoundaries-TZA-ADM2' from data source
`D:\2. SMU - MITB\Term 4\ISSS626 Geospatial\YX-Leng\ISSS626-Geospatial\Take-home_Ex\Take-home_Ex03\data\rawdata\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 170 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 29.58953 ymin: -11.76235 xmax: 40.44473 ymax: -0.983143
Geodetic CRS: WGS 84
π‘ Updating CRS information
st_crs(GB) <- 4210Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
GB <- st_transform(GB, crs = 32736)
head(GB, 5)Simple feature collection with 5 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 694340.1 ymin: 9563538 xmax: 992833.3 ymax: 9762474
Projected CRS: WGS 84 / UTM zone 36S
shapeName shapeISO shapeID shapeGroup shapeType
1 Arusha <NA> 72390352B32479700182608 TZA ADM2
2 Arusha Urban <NA> 72390352B90906351205470 TZA ADM2
3 Karatu <NA> 72390352B22674606658861 TZA ADM2
4 Longido <NA> 72390352B95731720096997 TZA ADM2
5 Meru <NA> 72390352B99598192663387 TZA ADM2
geometry
1 MULTIPOLYGON (((929196 9602...
2 MULTIPOLYGON (((918731.5 96...
3 MULTIPOLYGON (((814868.8 96...
4 MULTIPOLYGON (((873372.8 96...
5 MULTIPOLYGON (((929196 9602...
Tanz_raw = read_csv("data/rawdata/aspatial/FinScope Tanzania 2023_Individual Main Data_FINAL.csv")Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 9915 Columns: 721
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (703): reg_name, dist_name, ward_code1, ward_name, ea_code, clustertype,...
dbl (13): SN, reg_code, dist_code, c8c, D6_1_1, D6_1_2, D6_1_3, gov_3, cmg4...
lgl (5): e_5_1, e_5_2, g_5_2__5, g_5_2__13, serv2_4
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Tanz_raw, 5)# A tibble: 5 Γ 721
SN reg_name reg_code dist_code dist_name ward_code1 ward_name ea_code
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 4529 Mwanza 19 7 Misungwi 251 Mondo 004
2 4245 Kagera 18 7 Missenyi 011 Kakunyu 001
3 8149 Mbeya 12 3 Kyela 283 Nkuyu 301
4 6763 Dodoma 1 3 Kongwa 123 Kibaigwa 301
5 7805 Dar es Salaam 7 2 Ilala 252 Majohe 029
# βΉ 713 more variables: clustertype <chr>, c1 <chr>, c2 <chr>, c7 <chr>,
# c8c <dbl>, c9 <chr>, c10 <chr>, c11 <chr>, c8n_a1 <chr>, c8n_a2 <chr>,
# c8n_b1 <chr>, c8n_b2 <chr>, c8n_c1 <chr>, c8n_c2 <chr>, c8n_d1 <chr>,
# c8n_d2 <chr>, c8n_e1 <chr>, c8n_e2 <chr>, c8n_f1 <chr>, c8n_f2 <chr>,
# c12_1 <chr>, c12_2 <chr>, c12_3 <chr>, c12_4 <chr>, c22 <chr>, c14 <chr>,
# c15 <chr>, C16 <chr>, c16_unit <chr>, c17_1__1 <chr>, c17_1__2 <chr>,
# c17_1__3 <chr>, c17_1__4 <chr>, c17_1__5 <chr>, c17_1__6 <chr>, β¦
π‘ Count number of unique dist_name
Tanz_raw %>%
count(dist_name) %>%
print()# A tibble: 148 Γ 2
dist_name n
<chr> <int>
1 Arumeru 105
2 Arusha 75
3 Babati 105
4 Bagamoyo 73
5 Bahi 45
6 Bariadi 75
7 Biharamulo 45
8 Buhigwe 45
9 Bukoba 74
10 Bukombe 43
# βΉ 138 more rows
3.2 Geospatial: Converting Multipolygon to Polygon
As the district level boundary GIS data in multipolygon form will cause overlays or intersections, it is more intuitive to calculate the area of each polygon and group them together by the unique district name and then select the largest polygon by area.
π‘ Calculate the area of each polygon
sf_GB <- GB %>%
st_cast("POLYGON") %>%
mutate(area = st_area(.))Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant
π‘ Group and select the largest polygon by area
GB_clean <- sf_GB %>%
group_by(shapeName) %>%
filter(area == max(area)) %>%
ungroup() %>%
select(-area) %>%
select(shapeName)3.3 Aspatial: Filtering columns for information required
As the FinScope Tanzania 2023 dataset contains 721 variables and not all variables will be used in our analysis, we will filter for the necessary independent and dependent variables based to analyse on:
(a) how cluster type, gender and access to technology affects emergency savings, and
(b) how cluster type, gender and access to technology affects borrowing.
Tanz_filter <- Tanz_raw %>%
select(dist_name, clustertype, c8c, c9, c23__1,c23__2,,e_7_n_3,F4_1__3,F4_1__5,F4_1__8,F4_1__13,F4_1__14,
g_2_3,g_5_1__1,g_5_1__2,g_5_1__3,g_5_1__5,g_5_1__6,g_5_1__7,
g_5_1__8,g_5_1__9,g_5_1__10,g_5_1__11,g_5_1__12,g_5_1__13,
ins3_1__3,ins3_1__9,Household_weight,population_wt)3.4 Aspatial: Rename Columns
Tanz_filter <- Tanz_filter %>%
rename(
Age = c8c,
Gender = c9,
Access_MobilePhone = c23__1,
Access_Internet = c23__2,
Savings_Emerg = e_7_n_3,
Savings_SACCOS = F4_1__3,
Savings_MobileWallet = F4_1__5,
Savings_Group = F4_1__8,
Savings_Cash = F4_1__13,
Savings_Lifestock = F4_1__14,
Borrow_Amount = g_2_3,
Borrow_Bank = g_5_1__1,
Borrow_MFI = g_5_1__2,
Borrow_SACCOS = g_5_1__3,
Borrow_Gov = g_5_1__5,
Borrow_MM = g_5_1__6,
Borrow_Pension = g_5_1__7,
Borrow_Employer = g_5_1__8,
Borrow_Fam_Pay = g_5_1__9,
Borrow_Fam_NoPay = g_5_1__10,
Borrow_SavGroup = g_5_1__11,
Borrow_ML = g_5_1__12,
Borrow_Religious = g_5_1__13,
Insurance_Health = ins3_1__3,
Insurance_Life = ins3_1__9
)3.5 Aspatial: Convert Yes/No to a numeric vector
Tanz_filter$clustertype <- as.numeric(Tanz_filter$clustertype == "Urban")
Tanz_filter$Gender <- as.numeric(Tanz_filter$Gender == "Female")
Tanz_filter$Access_MobilePhone <- as.numeric(Tanz_filter$Access_MobilePhone == "Yes")
Tanz_filter$Access_Internet <- as.numeric(Tanz_filter$Access_Internet == "Yes")
Tanz_filter$Savings_Emerg <- as.numeric(Tanz_filter$Savings_Emerg == "Yes")
Tanz_filter$Savings_SACCOS <- as.numeric(Tanz_filter$Savings_SACCOS == "Yes")
Tanz_filter$Savings_MobileWallet <- as.numeric(Tanz_filter$Savings_MobileWallet == "Yes")
Tanz_filter$Savings_Group <- as.numeric(Tanz_filter$Savings_Group == "Yes")
Tanz_filter$Savings_Cash <- as.numeric(Tanz_filter$Savings_Cash == "Yes")
Tanz_filter$Savings_Lifestock <- as.numeric(Tanz_filter$Savings_Lifestock == "Yes")
Tanz_filter$Borrow_Bank <- as.numeric(Tanz_filter$Borrow_Bank == "Yes")
Tanz_filter$Borrow_MFI <- as.numeric(Tanz_filter$Borrow_MFI == "Yes")
Tanz_filter$Borrow_SACCOS <- as.numeric(Tanz_filter$Borrow_SACCOS == "Yes")
Tanz_filter$Borrow_Gov <- as.numeric(Tanz_filter$Borrow_Gov == "Yes")
Tanz_filter$Borrow_MM <- as.numeric(Tanz_filter$Borrow_MM == "Yes")
Tanz_filter$Borrow_Pension <- as.numeric(Tanz_filter$Borrow_Pension == "Yes")
Tanz_filter$Borrow_Employer <- as.numeric(Tanz_filter$Borrow_Employer == "Yes")
Tanz_filter$Borrow_Fam_Pay <- as.numeric(Tanz_filter$Borrow_Fam_Pay == "Yes")
Tanz_filter$Borrow_Fam_NoPay <- as.numeric(Tanz_filter$Borrow_Fam_NoPay == "Yes")
Tanz_filter$Borrow_SavGroup <- as.numeric(Tanz_filter$Borrow_SavGroup == "Yes")
Tanz_filter$Borrow_ML <- as.numeric(Tanz_filter$Borrow_ML == "Yes")
Tanz_filter$Borrow_Religious <- as.numeric(Tanz_filter$Borrow_Religious == "Yes")
Tanz_filter$Insurance_Health <- as.numeric(Tanz_filter$Insurance_Health == "Yes")
Tanz_filter$Insurance_Life <- as.numeric(Tanz_filter$Insurance_Life == "Yes")Tanz_filter$Borrow_Amount <- ifelse(is.na(Tanz_filter$Borrow_Amount), 0, Tanz_filter$Borrow_Amount)
Tanz_filter$Borrow_Amount <- as.numeric(Tanz_filter$Borrow_Amount)Warning: NAs introduced by coercion
Tanz_filter[is.na(Tanz_filter)] <- 0head(Tanz_filter,5)# A tibble: 5 Γ 29
dist_name clustertype Age Gender Access_MobilePhone Access_Internet
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Misungwi 0 47 1 1 1
2 Missenyi 0 63 1 0 0
3 Kyela 1 74 0 1 0
4 Kongwa 1 29 1 1 0
5 Ilala 1 53 0 1 0
# βΉ 23 more variables: Savings_Emerg <dbl>, Savings_SACCOS <dbl>,
# Savings_MobileWallet <dbl>, Savings_Group <dbl>, Savings_Cash <dbl>,
# Savings_Lifestock <dbl>, Borrow_Amount <dbl>, Borrow_Bank <dbl>,
# Borrow_MFI <dbl>, Borrow_SACCOS <dbl>, Borrow_Gov <dbl>, Borrow_MM <dbl>,
# Borrow_Pension <dbl>, Borrow_Employer <dbl>, Borrow_Fam_Pay <dbl>,
# Borrow_Fam_NoPay <dbl>, Borrow_SavGroup <dbl>, Borrow_ML <dbl>,
# Borrow_Religious <dbl>, Insurance_Health <dbl>, Insurance_Life <dbl>, β¦
3.6 Aspatial: Replace dist_name to avoid empty polygons
For the dist_name that cannot match to the GIS data for Tanzania, it will be replaced by the values of its Mother District (for coast) or the nearest city.
Tanz_join <- Tanz_filter %>%
mutate(dist_name = case_when(
dist_name == "Tanganyika" ~ "Tanga Urban",
dist_name == "Kigamboni" ~ "Temeke",
dist_name == "Arumeru" & clustertype == "1" ~ "Arusha Urban",
dist_name == "Arumeru" & clustertype == "0" ~ "Arusha",
dist_name == "Butiama" ~ "Butiam",
dist_name == "Dodoma" ~ "Dodoma Urban",
dist_name == "Tanga" ~ "Tanga Urban",
dist_name == "Malinyi" ~ "Morogoro",
dist_name == "Magharibi B"~ "Magharibi",
dist_name == "Magharibi A"~ "Magharibi",
dist_name == "Ubungo"~ "Kinondoni",
dist_name == "Tabora"~ "Tabora Urban",
dist_name == "Kibiti"~ "Mkuranga",
TRUE ~ dist_name # Keep all other values unchanged
))3.7 Perform relational join
Tanz_final <- left_join(Tanz_join,GB_clean,
by = c("dist_name" = "shapeName"), relationship = "many-to-many")π‘ Check for any empty polygons and class
empty_polygons <- Tanz_final %>%
filter(st_is_empty(geometry)) %>%
select(dist_name) %>%
distinct()class(Tanz_final)[1] "tbl_df" "tbl" "data.frame"
3.8 Writing, reading data file to rds
write_rds(Tanz_final, "data/rds/Tanz_final.rds")Tanz_final <- read_rds("data/rds/Tanz_final.rds")4 Non-Spatial Regression Methods
4.1 Check for multi-collinearity
Tanz_corr <- Tanz_final %>%
st_drop_geometry()
corrplot::corrplot(cor(Tanz_corr[,2:27]),
diag = FALSE,
order = "AOE",
tl.pos = "td",
tl.cex = 0.5,
method = "number",
type = "upper")
Since all the correlation values are < 0.8, there is no signs of multi-collinearity.
4.2 Build a non-spatial multi-linear regression
4.2.1 How is borrowing amount affected by various independent variables?
## need to log the borrowing amount to normalize extreme values
Tanz_final$Borrow_Amount <- log(Tanz_final$Borrow_Amount + 1)Borrowing_mlr <- lm(
Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone + Access_Internet + Borrow_Bank + Borrow_MFI +
Borrow_SACCOS + Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer + Borrow_Fam_Pay + Borrow_Fam_NoPay +
Borrow_SavGroup + Borrow_ML + Borrow_Religious,
data=Tanz_final)
summary(Borrowing_mlr)
Call:
lm(formula = Borrow_Amount ~ clustertype + Age + Gender + Access_MobilePhone +
Access_Internet + Borrow_Bank + Borrow_MFI + Borrow_SACCOS +
Borrow_Gov + Borrow_MM + Borrow_Pension + Borrow_Employer +
Borrow_Fam_Pay + Borrow_Fam_NoPay + Borrow_SavGroup + Borrow_ML +
Borrow_Religious, data = Tanz_final)
Residuals:
Min 1Q Median 3Q Max
-41.226 -0.853 -0.566 0.660 17.137
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.374761 0.099271 3.775 0.000161 ***
clustertype 0.095177 0.055219 1.724 0.084806 .
Age 0.002801 0.001508 1.858 0.063193 .
Gender -0.314786 0.050635 -6.217 5.28e-10 ***
Access_MobilePhone 0.388865 0.072849 5.338 9.61e-08 ***
Access_Internet 0.188531 0.058460 3.225 0.001264 **
Borrow_Bank 8.752231 0.226625 38.620 < 2e-16 ***
Borrow_MFI 9.294665 0.192160 48.369 < 2e-16 ***
Borrow_SACCOS 8.621757 0.359141 24.007 < 2e-16 ***
Borrow_Gov 6.495580 0.570177 11.392 < 2e-16 ***
Borrow_MM 2.848696 0.199602 14.272 < 2e-16 ***
Borrow_Pension -3.805797 1.114491 -3.415 0.000641 ***
Borrow_Employer 3.609994 0.265147 13.615 < 2e-16 ***
Borrow_Fam_Pay 8.767819 0.057526 152.414 < 2e-16 ***
Borrow_Fam_NoPay 0.842179 0.215213 3.913 9.17e-05 ***
Borrow_SavGroup 8.695528 0.099584 87.318 < 2e-16 ***
Borrow_ML 7.036817 0.164615 42.747 < 2e-16 ***
Borrow_Religious 8.537908 1.007762 8.472 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.467 on 9897 degrees of freedom
Multiple R-squared: 0.7967, Adjusted R-squared: 0.7963
F-statistic: 2281 on 17 and 9897 DF, p-value: < 2.2e-16
Adjusted R-squared of ~80% (0.7963) indicates a strong model fit and no overfitting.
The model shows that the source of borrowing strongly influenced the borrowing amounts. Tanzanians tend to borrow from micro-finance institutions (MFI) and family/religious groups, but less from Pension (negative estimate). The negative estimate for gender also indicates that males borrow less than females (one-hot encoding female = 1).
The significance of access to mobile phones to borrowing amount suggests that financial accessibility may be linked to mobile connectivity.
ols_vif_tol(Borrowing_mlr) Variables Tolerance VIF
1 clustertype 0.9039700 1.106231
2 Age 0.9743801 1.026293
3 Gender 0.9701335 1.030786
4 Access_MobilePhone 0.9324815 1.072407
5 Access_Internet 0.8764393 1.140980
6 Borrow_Bank 0.9751965 1.025434
7 Borrow_MFI 0.9861093 1.014086
8 Borrow_SACCOS 0.9874746 1.012684
9 Borrow_Gov 0.9868489 1.013326
10 Borrow_MM 0.9760810 1.024505
11 Borrow_Pension 0.9801367 1.020266
12 Borrow_Employer 0.9811711 1.019190
13 Borrow_Fam_Pay 0.9617857 1.039733
14 Borrow_Fam_NoPay 0.9584516 1.043350
15 Borrow_SavGroup 0.9833790 1.016902
16 Borrow_ML 0.9826744 1.017631
17 Borrow_Religious 0.9990478 1.000953
ols_plot_resid_fit(Borrowing_mlr)
ols_plot_resid_hist(Borrowing_mlr)
mlr.output <- as.data.frame(Borrowing_mlr$residuals) Borrowing.res.sf <- cbind(Tanz_final,
MLR_RES = Borrowing_mlr$residuals)Borrowing.sf <- st_as_sf(Borrowing.res.sf)
Borrowing.sp <- as_Spatial(Borrowing.sf)
Borrowing.spclass : SpatialPolygonsDataFrame
features : 9915
extent : 122650.6, 1317290, 8697326, 9890957 (xmin, xmax, ymin, ymax)
crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
variables : 30
names : dist_name, clustertype, Age, Gender, Access_MobilePhone, Access_Internet, Savings_Emerg, Savings_SACCOS, Savings_MobileWallet, Savings_Group, Savings_Cash, Savings_Lifestock, Borrow_Amount, Borrow_Bank, Borrow_MFI, ...
min values : Arusha, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
max values : Wete, 1, 100, 1, 1, 1, 1, 1, 1, 1, 1, 1, 18.4206807539524, 1, 1, ...
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(GB_clean)+
tmap_options(check.and.fix = TRUE) +
tm_polygons(alpha = 0.4) +
tm_shape(Borrowing.sp) +
tm_dots(col = "MLR_RES",
alpha = 0.6,
style = "quantile")Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.